Univariate Statistics

Lauren Talluto

14.01.2025

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

The central limit theorem revisited

Reminder: standard error of the mean:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

We can also standardize (scale) the sample means:

\[ {z} = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

These z-values follow a standard normal distribution (with mean=0 and sd=1).

The central limit theorem revisited

Reminder: standard error of the mean:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

We can also standardize (scale) the sample means:

\[ {z} = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

These z-values follow a standard normal distribution (with mean=0 and sd=1).

If we know \(\mu\) and \(\sigma\), we can define an interval around \(\mu\) such that:

Putting the central limit theorem to work: Confidence intervals

\[ z = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \] \[ P(-z_\frac{\alpha}{2}\leq \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}\leq+z_\frac{\alpha}{2})=1-\alpha \] \[ P(\bar{x}-z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}}\leq \mu \leq\bar{x}+z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}})=1-\alpha \] \[ CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}} \]

Putting the central limit theorem to work: Confidence intervals

\[ CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}} \]

Putting the central limit theorem to work: Confidence intervals

Vice versa, we could compute an interval around any found sample mean, which includes \(\mu\) with a certain probability. This probability - called confidence - should be close to 1. Its complement to 1 is here called \(\alpha\).

\[ z = \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \] \[ P(-z_\frac{\alpha}{2}\leq \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}\leq+z_\frac{\alpha}{2})=1-\alpha \] \[ P(\bar{x}-z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}}\leq \mu \leq\bar{x}+z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}})=1-\alpha \] \[ CI: \bar{x}\pm z_\frac{\alpha}{2}\frac{\sigma}{\sqrt{n}} \]

A confidence interval is constructed symmetrically around a sample mean \(\bar{x}\). With a high probability (usually 95%) \(\mu\) is included in intervals constructed around \(\bar{x}\)’s.

Graphically this is just a translation of the density distributions of the means from \(\mu\) as center to \(\bar{x}\) as center.

Cool! One sample is enough for a good interval estimate of \(\mu\). But wait - where from do I know \(\sigma\)?!?

The t-distribution

If the S.E.M. is estimated using the sample standard deviation, then standardized sample means follow a Student´s t-distribution:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \approx \frac{s}{\sqrt{n}} = S.E.M. \] \[ {t} = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \]

The t-distribution is wider, i.e. has fatter tails, when n is low, but converges to a normal with infinite n. Its shape is determined by d.f.=n-1. Functions in R are qt, pt,dt and rt.

Above 100 df, t is basically equivalent to a normal distribution.

Confidence intervals: accuracy, precision, sample size

The t-distribution can be used to construct a confidence interval:

\[ {t} = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \] \[ P(\bar{x}-t_{\frac{\alpha}{2},d.f.}\frac{s}{\sqrt{n}}\leq \mu \leq\bar{x}+t_{\frac{\alpha}{2},d.f.}\frac{s}{\sqrt{n}})=1-\alpha \] \[ CI: \bar{x}\pm t_{\frac{\alpha}{2},d.f.} \frac{s}{\sqrt{n}} \]

Allows an interval estimate of \(\mu\) with an estimate for \(\sigma\) derived from the same sample as \(\bar{x}\).


data = read.table("../datasets/Glaciers.txt", header=TRUE)
# radiocarbon in dissolved organic carbon from glacier ice samples 
d14C = data$delta14C.permil

# Give mean and confidence interval with alpha=0.05 #
(m = mean(d14C))
## [1] -426.2994
(s = sd(d14C))
## [1] 143.8792
(n = length(d14C))
## [1] 19
alpha = 0.05
accuracy = 1-alpha

(t = qt(p = 1 - alpha/2, df = n-1)) # two-sided, P goes 50/50 to both tails
## [1] 2.100922
(aprec = t*s/sqrt(n))
## [1] 69.34756
aprec # absolute precision = half the CI width
## [1] 69.34756
(rprec = aprec/m*100) # relative precision
## [1] -16.26734

# confidence limits
c(lower = m - aprec, upper = m + aprec)
##     lower     upper 
## -495.6469 -356.9518

Confidence intervals: accuracy, precision, sample size

Confidence intervals: accuracy, precision, sample size

\[ AP= t_{\frac{\alpha}{2},d.f.} \frac{s}{\sqrt{n}} \]

\[ RP= \frac{t_{\frac{\alpha}{2},d.f.}}{\bar{x}} \frac{s}{\sqrt{n}} \]

Confidence intervals: accuracy, precision, sample size

\[ n= \left(\frac{t_{\frac{\alpha}{2},d.f.}\cdot{s}}{{RP\cdot\bar{x}}}\right)^2 \]

We must solve this formula iteratively (\(n\) is on both sides)

# Regard data as pilot study
# Set accuracy and rprec as needed
accuracy = .95 # <90-95 % is questionable
rprec = 0.10 # assume more (or less) precise estimate is needed

alpha = 1-accuracy

n = 2   # left side of formula, low n as starting value
t = qt(1 - alpha/2, n-1)  
(rhs = (t * s/(rprec * m))^2)  # right side of formula
## [1] 1839.071

# right side minus left side, you want this to be close to zero, 
# when diff falls below zero - then enough sampling effort 
diff = rhs - n

while(diff >= 0) {
    n = n + 1
    t = qt(1 - alpha/2, n - 1)
    diff = ((t * s/(rprec * m))^2) - n
}
n # -> this is the needed sample size (minimum) to achieve the desired precision and accuracy
## [1] 47

Confidence intervals: accuracy, precision, sample size

\[ n= \left(\frac{t_{\frac{\alpha}{2},d.f.}\cdot{s}}{{RP\cdot\bar{x}}}\right)^2 \]

We must solve this formula iteratively (\(n\) is on both sides)

The point where a function crosses zero is the function’s root. R has a built-in function uniroot that will find a root for us.

f = function(n, alpha = 0.05, rprec = 0.1) {
    t = qt(1 - alpha/2, n-1)
    ((t * s/(rprec * m))^2) - n
}
uniroot(f, lower = 2, upper = 1000)
## $root
## [1] 46.19821
## 
## $f.root
## [1] -1.037638e-05
## 
## $iter
## [1] 8
## 
## $init.it
## [1] NA
## 
## $estim.prec
## [1] 6.103516e-05

Accuracy and precision illustrated

Statistical decision theory

… at the heart of inferential statistics: learning about the population from sample(s).

Statistical decision theory

… at the heart of inferential statistics: learning about the population from sample(s).


Statistical decision theory

… at the heart of inferential statistics: learning about the population from sample(s).



Example:

H0: Two populations do not differ. An eventually observed difference obtained from two samples is entirely due to chance. If the studied property is naturally variable, then samples will never be identical.

HA: Two populations differ. An observed difference between two samples reflects this difference, the samples were collected from two different underlying populations.

Statistical decision theory

… at the heart of inferential statistics: learning about the population from sample(s).



Example:

H0: Two populations do not differ. An eventually observed difference obtained from two samples is entirely due to chance. If the studied property is naturally variable, then samples will never be identical.

HA: Two populations differ. An observed difference between two samples reflects this difference, the samples were collected from two different underlying populations.


A decision is made by rejecting/accepting the hypotheses, our aim is to reject H0 and accept HA (the other direction is harder).

Now, imagine you observe a difference between two samples. If asked to make a decision you will probably look at the absolute observed difference (sample mean B is X-times larger than sample mean A), maybe also try to take into account the observable variation in the two samples. Make a decision ;-)

Type I & II Errors

When making a decision based on (incomplete) sample information you can make an error!

There are two types of error: \(\alpha\) (type I), \(\beta\) (type II)

Type I & II Errors

  • Type I error: false alarm (or false positive)
    • There is no real effect, but you think there is one.
  • Type II error: missed opportunity (or false negative)
    • There is a real effect, but you failed to find it.

Type I & II Errors

Power (\(1- \beta\)): The probability to correctly identify an existing real effect.


Power increases

  • as the strength of the effect increases (e.g., larger difference between populations)
  • as the population variance decreases
  • as we increase alpha
  • as we increase sample size

Statistical hypothesis testing - strategy

Consider the example from before as a classical situation of a two sample-test:

H0: Two populations do not differ. An eventually observed difference obtained from two samples is entirely due to chance. If the studied property is naturally variable, then samples will never be identical.

HA: Two populations differ. An observed difference between two samples reflects this difference, the samples were collected from two different underlying populations.

Statistical hypothesis testing - strategy

Consider the example from before as a classical situation of a two sample-test:

H0: Two populations do not differ. An eventually observed difference obtained from two samples is entirely due to chance. If the studied property is naturally variable, then samples will never be identical.

HA: Two populations differ. An observed difference between two samples reflects this difference, the samples were collected from two different underlying populations.


To test, we have collected empirical information, i.e. each population is represented by 1 sample (of some size).

Statistical hypothesis testing - strategy

  1. We assume H0 is true.
    • We are suspicious, however. Maybe the samples are really different.
    • Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did

Statistical hypothesis testing - strategy

  1. We assume H0 is true.
    • We are suspicious, however. Maybe the samples are really different.
    • Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did
  2. Compare \(p\) to a pre-set error threshold (\(\alpha\)).

Statistical hypothesis testing - strategy

  1. We assume H0 is true.
    • We are suspicious, however. Maybe the samples are really different.
    • Thus we compute \(p\), the probability that, if H0 is true, we would get a result as extreme as we did
  2. Compare \(p\) to a pre-set error threshold (\(\alpha\)).
  3. If \(p < \alpha\) (i.e., our empirical results are VERY unlikely) we decide H0 cannot be true and reject it.

Mechanics of testing illustrated step-by-step: one-sample test

Example: Mice population on an island. We do a census and identify body size (as weight) of ALL MICE.

Weight ~ Normal(\(\mu_0\), \(\sigma\))

On the way home between island and mainland we find a single mouse on a drifting log, it is surprisingly large

Is it from the island? From the mainland? We will use a 1-sample test with a size n = 1.

Mechanics of testing illustrated step-by-step: one-sample test

Example: Mice population on an island. We do a census and identify body size (as weight) of ALL MICE.

Weight ~ Normal(\(\mu_0\), \(\sigma\))

On the way home between island and mainland we find a single mouse on a drifting log, it is surprisingly large

Is it from the island? From the mainland? We will use a 1-sample test with a size n = 1.

data = read.table("../datasets/IslandMice.txt", header=TRUE)
x = data$weight
x1 = 13 # Drifting mouse (x1)
(mu = mean(x)) # population mean of island mice
## [1] 10.04951
(n = length(x)) # population size
## [1] 999
(sigma = sqrt(sum((x - mu)^2)/n)) # population sd (sigma)
## [1] 2.04207

Mechanics of testing illustrated step-by-step: one-sample test

Example: Mice population on an island. We do a census and identify body size (as weight) of ALL MICE.

Weight ~ Normal(\(\mu_0\), \(\sigma\))

On the way home between island and mainland we find a single mouse on a drifting log, it is surprisingly large

Is it from the island? From the mainland? We will use a 1-sample test with a size n = 1.


Testable hypotheses:

H0: The ‘new’ mouse belongs to the island population. Its weight is similar to those of other island mice: Its relatively high weight is entirely due to chance, it´s just a slightly heavy mouse of the population.

HA: The ‘new’ mouse does not belong to the island population. Its weight is higher than that of other island mice, it must belong to some other mouse population, say from the mainland.

Mechanics of testing: assuming H0 but remaining suspicious

Mechanics of testing: assuming H0 but remaining suspicious

\[ {TS} = z_1 = \frac{(x_1-\mu)}{\sigma} \] \[ P(z\geq z_1) = ? \]

Mechanics of testing: evaluating probabilities

\[ {TS} = z_1 = \frac{(x_1-\mu)}{\sigma} \] \[ p = P(z\geq z_1) = ? \]

Assume our population of body weights is normally distributed.

We can compute \(p\) as the integral of the normal PDF.

Mechanics of testing: assessing significance

Mechanics of testing: assessing significance

# z-score for our drifter
(z1 = (x1 - mu)/sigma)
## [1] 1.444853

# Probability to find a mouse as heavy or 
# heavier than x1 in the island population
# first on the standardized (z-) scale
(p = 1 - pnorm(z1, mean = 0, sd = 1))
## [1] 0.07424962
# and again on the original scale
(1 - pnorm(x1, mean = mu, sd = sigma))
## [1] 0.07424962

The probability that a mouse is as heavy as (or heavier than) the drifting mouse is 7.4 %. Therefore, we fail to reject the null hypothesis that the mouse is from the island.

Mechanics of testing: assessing error

Reality
Decision H\(_0\) true H\(_0\) false
accept H\(_0\) correct decision type II error (β)
reject H\(_0\) type I error (α) correct decision

Mechanics of testing: assessing error

Reality
Decision H\(_0\) true H\(_0\) false
accept H\(_0\) correct decision type II error (β)
reject H\(_0\) type I error (α) correct decision



For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

Mechanics of testing: assessing error

Reality
Decision H\(_0\) true H\(_0\) false
accept H\(_0\) correct decision type II error (β)
reject H\(_0\) type I error (α) correct decision



For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

\(z_1 > z_{crit}\)

Decision:

Mechanics of testing: assessing error

Reality
Decision H\(_0\) true H\(_0\) false
accept H\(_0\) correct decision type II error (β)
reject H\(_0\) type I error (α) correct decision



For every test statistic, there will exist a critical test statistic where \(P(z \ge z_{crit}) = \alpha\).

\(z_1 < z_{crit}\)

Decision:

Mechanics of testing: computing beta and power

A well-defined HA could be: Mouse comes from mainland, where :

\(weight \sim \mathcal{N}(\mu_{main}, \sigma)\) with \(\mu_0 < \mu_{main}\).

# Read file of mainland mice
data_main = read.table("../datasets/MainlandMice.txt", header=TRUE)
(mu_main = mean(data_main$weight))
## [1] 14.98701
c(sd(y), sigma) # sd of mainland mice is ~identical to island mice
## [1] 2.00143 2.04207

# critical value on original scale
(x_crit = qnorm(0.95, mu, sigma)) 
## [1] 13.40842

# Type II error rate (orange)
# Probability to find a mouse lighter than xcrit
# under the assumption that HA is right
# (i.e., the mouse comes from the mainland)
(beta = pnorm(x_crit, mu_main, sigma))
## [1] 0.2197506

# Power is just 1 - beta
(1 - beta)
## [1] 0.7802494

Mechanics of testing: controlling decision errors


Mechanics of testing: increasing sample size

What if we find more than one mouse?

A sample of n > 1 allows us to compute test statistics on sample means!

Sampling with higher effort decreases both errors and thus allows a conservative test with more power.

Mechanics of testing: increasing sample size

# Sample of 3 mice: do they belong to the island population?
x_sample = c(12.09, 15.48, 11.76)
(xbar = mean(x_sample))
## [1] 13.11
(SEM = sigma/sqrt(length(x_sample)))
## [1] 1.17899

# Standardised weight of the sample mean
# assuming it belongs to a "population of island means"
(z_sample = (xbar - mu)/SEM)
## [1] 2.595859

# Probability to find a mean at least as great as (or greater than)
# the sample mean (of 3 drifting mice) in the population of island means
(p = 1 - pnorm(z_sample, 0,1))
## [1] 0.004717744
# equivalent to:
(p = 1 - pnorm(xbar, mean = mu, sd = SEM))
## [1] 0.004717744

# Power (1 - beta)
1 - pnorm(x_crit, mean = mu_main, sd = SEM)
## [1] 0.9097045

One-sample z-test

\[ TS =z= \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

One-sample t-test

\[ TS =z= \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \]

\[ t = \frac{(\bar{x}-\mu)}{\frac{s}{\sqrt{n}}} \]

\[ t \sim \mathcal{T}(\nu) \]

One-sample t-test

# Body temperature of crabs
data = read.table("../datasets/Crabs.txt", header=TRUE)
x = data$body_temp

# null hypothesis: body temp is equal to air temp
# HA: body temp is greater (metabolic heat)
# one-sided test!
air_temp = 24.3

(xbar = mean(x))
## [1] 25.028
(s = sd(x))
## [1] 1.341802
n = length(x)
(SEM = s/sqrt(n))
## [1] 0.2683605

(t = (xbar - air_temp)/SEM)
## [1] 2.712769
(t_crit = qt(p = 0.95, df= n - 1))
## [1] 1.710882
(p = 1 - pt(t, n - 1))
## [1] 0.006072769

One-sample t-test

# Maybe simpler:
# note default alternative="two.sided"
t.test(x, mu = air_temp, alternative = "greater") 
## 
##  One Sample t-test
## 
## data:  x
## t = 2.7128, df = 24, p-value = 0.006073
## alternative hypothesis: true mean is greater than 24.3
## 95 percent confidence interval:
##  24.56887      Inf
## sample estimates:
## mean of x 
##    25.028

One-sample t-test

# confidence interval: 2-sided critical t-value
t_quantile = qt(1 - 0.05/2, df = n - 1)

# re-scale to the original measurement scale
precision = t_quantile * SEM

(interval = xbar + c(-precision, precision))
## [1] 24.47413 25.58187
# Air temperature is outside of this interval!

For 2-sided tests, we can equivalently construct a confidence interval using the quantiles of the t-distribution

Interpretation: if H0 is true (i.e., our population has a mean = \(\mu\), 95% of intervals constructed from similar samples will include \(\mu\).

Our interval does not include \(\mu\), so we can reject the hypothesis that the population mean is \(\mu\)

Steps to conduct a (two-sided) statistical test.

Significance levels and example statements


One-sided vs two-sided hypotheses


Parametric tests

Normality checks

Graphical checks

Histograms with normal density curves

Normality checks

Graphical checks

Histograms with normal density curves

Compare mean and median

Normality checks

Graphical checks

Normality checks

Graphical checks

Normality tests

Normality tests

Conclusions/advice:

F-distribution

Another thought experiment:

  1. Take two samples from a population (\(\mathcal{N}[\mu, \sigma]\))
  2. Calculate \(s_1^2\) (from sample 1 with \(n_1\)) and Calculate \(s_2^2\) (from sample 1 with \(n_2\))
  3. Calculate test statistic F: \[ F=\frac{s_1^2}{s_2^2} \]
  4. Repeat 1.-3. and build distribution of F-values

F-test: Variance homogeneity?

H0: The sample variances estimate the same parametric variance. Or: \(\sigma_1^2 = \sigma_2^2\)

Variance homogeneity = homoscedasticity


HA: The sample variances estimate different parametric variances. Or: \(\sigma_1^2 \neq \sigma_2^2\)

Variance heterogeneity = heteroscedasticity


Only use this to test variance heterogeneity (HA). Do not use it to check whether two samples have the same variance (same problems as Shapiro test)!

Standard tests for location

Considerations:

Standard tests for location

Two-sample tests: independent t-test

\[ \begin{align} t &= \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{2}{(n_1+n_2)}}} \\ \\ s_p &= \sqrt{\frac{s^2_1 + s^2_2}{2}} \\ \nu & = n_1 + n_2 - 2 \end{align} \]

Two-sample tests: independent t-test

\[\begin{align} t &= \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s^2_1}{n_1}}+\sqrt{\frac{s^2_2}{n_2}}} \\ \\ \nu & \approx \frac{\left( \frac{s^2_1}{n_1} + \frac{s^2_2}{n_2} \right )^2} {\frac{s^4_1}{n^2_1\nu_1} + \frac{s^4_2}{n^2_2\nu_2}} \\ \end{align}\]

Two-sample tests: paired t-test

x1 = c(gandalf = 6, saruman = 4, arwen = 7, frodo = 3)
x2 = c(gandalf = 4, saruman = 3, arwen = 5, frodo = 2)
x1 - x2
## gandalf saruman   arwen   frodo 
##       2       1       2       1

t.test(x1, x2, paired = TRUE)
## 
##  Paired t-test
## 
## data:  x1 and x2
## t = 5.1962, df = 3, p-value = 0.01385
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.5813069 2.4186931
## sample estimates:
## mean difference 
##             1.5
t.test(x1 - x2)
## 
##  One Sample t-test
## 
## data:  x1 - x2
## t = 5.1962, df = 3, p-value = 0.01385
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.5813069 2.4186931
## sample estimates:
## mean of x 
##       1.5

Nonparametric location tests

Independent samples: Mann-Whitney U-test

Dependent samples: Wilcoxon test

rank(c(x1, x2))
## gandalf saruman   arwen   frodo gandalf saruman   arwen   frodo 
##     7.0     4.5     8.0     2.5     4.5     2.5     6.0     1.0

wilcox.test(x1, x2)
## Warning in wilcox.test.default(x1, x2): cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  x1 and x2
## W = 12, p-value = 0.3065
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(x1, x2, paired = TRUE)
## Warning in wilcox.test.default(x1, x2, paired = TRUE): cannot compute exact
## p-value with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  x1 and x2
## V = 10, p-value = 0.09467
## alternative hypothesis: true location shift is not equal to 0